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Abstract 

We introduce a new parametrization for the two-parameter species sampling model 
with finite but random number of different species recently introduced in Gnedin (20f0a). 
We show the reparametrization yields a representation in terms of generalized Waring 
mixture of Fisher species sampling models and derive the structural distribution of the 
model. 
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1 Introduction 

Gnedin (2010a) introduces a two parameter family of exchangeable partition models belonging 
to the Gibbs class of genius a = — 1 (Gnedin and Pitman, 2006), by suitably mixing Fisher's 
(1943) (— 1,£) partitions over the fixed number of boxes. The resulting (7, C) Gnedin-Fisher 
species sampling model has exchangeable partition probability function (EPPF) of the form 

/ x (7)n-fc]X fc =iV ~7* + TT 1 n\ 

„„<(„:,...,„„)= n? - 1{P+7l+0 Eb' m 

obtained by sequential construction with one-step allocation rules 

/ n \ , \ (n - k + j)( nj + 1) . fc 2 - 7/c + C 
(O )-Pj{n := =- f- for j = l,...,k and N) : p n) := -5- — , 

for 7 > and (i) either i 2 — ji + C (strictly) positive for all i 6 N or (ii) the quadratic is positive 
for i € {1, . . . , io — 1} and has a root at iq. (See Pitman, 2006, Hansen and Pitman, 2000, for 
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background on exchangeable partitions and sequential constructions). As from Theorem 1. 
in Gnedin (2010a) model ([1]) arises by mixing Poisson-Dirichlet (— 1,£) models (Pitman and 
Yor, 1997), far f = 1, 2,3, ... , over f with 



r( Zl + i)r(z 2 + i)nti(» 2 -7» + C) 

r( 7 ) m-iy. 



for some complex zi and z-z- For C = then 7 € (0, 1) (cfr. Gnedin, 2010a, Sect. 6) and JT]) 
reduces to 

p 7 (m,...,n fc j- (1 + 7)n _ i JLL n r- W 

The sequence of the number of occupied boxes K n for both model © and (JT]) is a nondecreasing 
Markov chain with — 1 increments and transition probabilities determined by the specific 
rule (N), whose distribution follows by the general formula for Gibbs partitions of genius 
a € (—00, 1) 

¥(K n = k) = V n>k S^- a , (4) 

for V n k the general Gibbs weights satisfying the backward recursion V n ^ = (n — ka)V n+ x^ + 
Vn+i,k+i and S~]j a generalized Stirling numbers. For a = — 1 those reduce to Lah numbers 
(see e.g. Charalambides, 2005) S n fc ' = (?Zj) fj hence, e.g. for the one-parameter model 

> t (a.-*)-(;) (1 V^ - (5) 

As from Gnedin (2010a, cfr. eq. (9) and (10)) the mixing law yielding the one-parameter 
model ([3]) arises from ([5]) for n — > 00 by the standard asymptotics T(n + a)/T(n + 6) ~ n a_ ' ) 
and corresponds to 

P T(S = 6= 2<l^kzl, (6) 
for £ = 1, 2, . . . , and 7 E (0, 1), while, for 1 < k < n, a posterior distribution for 5 results 

7 ^"^' n " j " (fc-i)!r(7 + n-fe)r(e + i)r(A ; + e + n)- 1 j 

2 The new parametrization 

Gnedin (2010a, cfr. Sect. 2) points out that the Gibbs weights of model (pQ) can be split in 
linear factors by factoring the quadratics as 

x 2 + 7X + C = (x + zi)(x + z 2 ), and x 2 - jx + ( = (x + si)(x + s 2 ) 

thus providing the alternative five parameters representation 

^7,C = (7)n-fe(gl + l)fc-l(g2 + l)fc-l (R) 

n > k (zi + l)„-i(% + l)„-i lj 
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for some complex zi, Z2, s%, S2, such that z\ + Z2 = 7, z\Z2 = (, s± + S2 = —7, S1S2 = (■ 



Here we show those constraints limit admissible values for the four parameters s±,S2,Zi 
and Z2 yielding an interesting alternative two parameter representation of the weights of the 
Gnedin-Fisher model. 

Theorem 1. For ip € [0,1) and < 7 < tp + 1 the EPPF of the two-parameter (7, C)~ 
Gnedin-Fisher species sampling model ((TJ admits the following alternative representation 



Ml (l+V)n-l(l+7-V0n-l /i J " () 

For if) = 0, then 7 € (0, 1) and Q yields the one-parameter Gnedin-Fisher model (|3j). 

Proof: For zi + Z2 = 7 ; ^1^2 = C> s i + s 2 = — 7 and S1S2 = C vectors (£1,2:2) and (si,S2) 
must be the roots (complex or real) of the following quadratic polynomials 



A' 



z\ - 7Z1 + £ = and z\ - 722 + C = 0, 
For 7 2 — AQ > admissible real solutions are 



s? + 7*1 + C = and sj + 7S2 + C = 0. 



7 ± ^ 7 2 - 4C . 7 ± yV - 4C 
= and Z2 = . 



si = v - and s 2 = — . 



For 7 2 — 4£ < admissible complex solutions are 



7± V 4 C-7 2 , 7 ± i\/ 4 C - 7 2 
Zi = and Z2 = • 



-7± V 4 C-7 2 , -7± V 4 C-7 2 
si = and S2 = . 



Now let indifferently A = iy 4£ — 7 2 /2 or A = \/j 2 — 4£/2. Then, regardless of the solutions 
being real or complex, possible vectors satisfying the constraints z\ + Z2 = 7, *i + *2 = — 1, 
z i z 2 = C an d S1S2 = C must be as follows 

(«,*) = @ + 42-it) or Q-a2 + ^ 

and 

(„,„)_ (-2 +jt ,_Z_ A ) or + 

which shows admissible solutions reduce to Zi = —s\ and Z2 = — S2 or zi = — S2 and Z2 = — s±. 
Since © is invariant to permutations of (zi, Z2) and (si, S2) the five parameters in (|SJ) reduce 
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to ip and 7 for z\ = if) and Z2 = 7 — ip> and s% = —ip, S2 = ip — 7 thus yielding ([9]). Moreover 
the positiveness of the numerator in ([9]) implies 1 — ^ > and 1 — 7 + tp > 0. For tp = 0, 
< 7 < 1 and ([9]) reduces to ([3]) by standard combinatorial calculus. □ 

Remark 2. We stress that, as from Lemma 5.2 in Gnedin (2010b), beside the extended two- 
parameter Poisson-Dirichlet (a, 0) family of partitions models, with a € (0, 1) and 9 > —a or 
a < and 8 = |a|£, £ = 1,2, . . ., the two-parameter Gnedin-Fisher models is the unique class 
of exchangeable random partitions with Gibbs weights in the nice multiplicative form 

nr=o fc ~^gog)n^ i 91 00 
nr=i^(o 

for <7o(')> S'i(') and <?(•) : N — > E satisfying the identity 

(n — a/c)<7o( n — A;) + <7i(A;) = g(n), l<Kn,n£N. 
In terms of equation (|10j) the weights in ([9]) may be written as 

^ = ng^cj + 7) n?=i(j - ^)(i - 7 + g 
nr=i 1 a+^)a+7-^) 

for 5o(i) = (* + 7)) 5i (j) = (i - VOC? - 7 + VO and #(Z) = (Z + + 7 - ip). 

For the reparametrized model (J9j> multistep allocation rules may be derived specializing the 
general form for Gibbs partitions of genius a £ (—00, 1) introduced in Cerquetti (2008) as 
follows. Start with box B\ t \, containing a single ball 1. At step n the allocation of n balls is a 
certain random partition II n = (-B nj i, . . . , B n ^K n ) of the set of balls [n]. Given the number of 
boxes is K n = k and the occupancy counts are (ni, . . . ,n k ), the partition of [n + m] at step 
n + m is obtained by randomly placing the additional m balls 

(AO): in k old boxes in configuration (mi, . . . , m k ), for > 0, Ylj=i m j = m i with probability 



(AN): in k* new boxes in configuration (si, . . . , s^*), for Y^j=i s j = m, 1 < /c* < m, Sj > 1, 
with probability 



k 



Ps{n) 



(7 + n - k) m -k* (k - ip) k * (k-j+ ip) k * 



k* 



(ifj + n) m (7 - ip + n) m i { 



Hsjl, (12) 



(ON): s < m balls in k* new boxes in configuration (si, . . . , s k *) and the remaining m — s 
balls in the k old boxes in configuration (mi, . . . ,m k ) for Y2j=i m j = m — s, 1 < s < m, 
J2j=i s j = s ; m j ^ 0> s i ^ 1 with probability 

/ \ (7 + n - k) m - k *(k - ij))k*{k - 7 + ip)k* Yj f , u , . 

ps ' m(n) ■= — w+»w7-*+n) m — n (T * + 1W Jr (13) 
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Corresponding (O) and (N) one-step allocation rules under the new (7, ip) parametrization 
arise respectively from (jlip for m = 1, mj = 1 and m/ = for I ^ j, and from ()12|) for m = 1, 
k* = 1 and si = 1, and are given by 

(O : pj- n := — ^ for j = 1, . . . , k and (N : p (n) := -=- — -r. 



Remark 3. Notice that (|13p . which is obtained specializing (19) in Cerquetti (2008), provides, 
once the notation is made consistent, the explicit form for Pb( n i, • • • 1 n §) in Sect. 7 of Gnedin 
(2010a) for the two-parameter (7, t/j) model. This kind of conditional distributions play a 
significant role in a Bayesian nonparametric approach to the treatment of species sampling 
problems under Gibbs priors, (see e.g. Favaro et al. 2009, Lijoi et al. 2007, 2008). Here 
we don't deal with this kind of applications. Some results in this perspective for the one 
parameter (7) Gnedin-Fisher model are in Cerquetti (2010). 



3 Mixture representation and the number of occupied boxes 

The fundamental result in Gnedin and Pitman (2006, cfr. Th. 12) establishes that the EPPF 
of each Gibbs partition of genius a G (—00, 1) corresponds to a mixture of extreme partitions 
probability function, which differ for a € (—00, 0), a = and a € (0, 1). Gnedin (2010a) pro- 
vides the mixing law ([2]) over Poisson-Dirichlet (— 1,£) (i.e. a = —1) extreme partitions that 
corresponds to the parametrization dTJ. Here we derive the mixing law for the reparametriza- 
tion introduced in Theorem 1. as the limit distribution of the number of blocks following 
the approach in Gnedin (2010a). Additionally, by an application of Bayes theorem, we pro- 
vide a direct proof of the weights in model ([9]) actually arising by mixing over £ the extreme 
PD(-l,£) weights. 

First notice that both the prior © and the posterior ([7]) for the number of blocks of the 
one-parameter (7)-Gnedin-Fisher model may be rewritten respectively as 

p _ & _ 7(1 ~ ik-i _ (l)g-i (1 - 7)g-i(7)i 

7( ^" C) " ~ r(0 (i), (14j 

and 

W-t[K.-k)- % i < k -\f + -'- k \ (15) 

1 (?) («0fc+£-l 

for £ = 1,2,..., which shows that both belong to the class of shifted univariate generalized 
Waring distributions, (also known as inverse Markov-Polya) . This is a family of distributions 
on N U (Irwin, 1975; Xekalaki, 1983; see also Johnson et al. 2005), whose probability mass 
function is given by 

i\ (a + p) v+i 

for i = 0,1,2, ... , for parameter a, 77, p positive reals, which arises by Beta(p, a) mixture of a 
Negative Binomial distribution (rj,p). Hence equations (fFi|) and (fT5j) correspond respectively 
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to NB(l,p) and i?e(7, 1— 7) and NB(k,p) and Be{n+^f — k, k — 7). The probability generating 
function is, except for the constant, the Gaussian hyper geometric function 

2 F x (a, rj;a + r, + p;z) = J2 , Y ! \ T 

^ (a + r, + p) l %\ 

which implies the generalized Waring distribution is overdispersed and characterized by heavy 
tail effect. Moreover E(X k ) < 00 if and only if p > k. 

The following result shows the reparametrization introduced in Theorem 1. yields even for 
the two-parameter (7, ip) Gnedin-Fisher model a representation in terms of shifted generalized 
Waring mixture of Fisher (— 1,£) models. 

Theorem 4. The EPPF in ([9]) arises by mixing the family of PD{— 1,£) partition models 

pg_l(ni,...,n fc ) = J_J_nj, (16) 

ewer £, with a shifted generalized Waring distribution of parameters a = 1 — 7 + -0, rj = 1 — ip 
and p = 7, 

r(0(i + ^ ' (17) 

for V € [0, 1) and 7 G (0, if) + 1). 

Proof. By the general formula ([!]) for the law of the number of blocks for Gibbs parti- 
tions of genius a, and exploting the definition of Lah numbers, the analogous of ([5]) for the 
two-parameter model is given by 

P ( K _ la _ (n-l\n\ (7) n -fc(l ~ VQfc-i(l ~ 7 + VQfc-i 
7 ^ " " J " U " l) k\ (1 + ^)„-i(l + 7 " ^)n-l ' 

Rewriting in terms of Gamma functions yields 

r(i + v)r(i + 7 - ^)(i - ^) fc _i(i - 7 + ^) fc -i r(n + i)r(n)r( 7 + n - k) 



■,ij){K n — k) 



T(k + l)r(Jfe)r(7) T(n-k + l)T(ifj + n)r( 7 - xj> + n) 

and, by Stirling approximations, for n — > 00 reduces to 

p> _ e\ _ (1Z ^g-^ 1 ~ 7 + ^-1(7)1-1/. riR x 

P ^"°- r(0d + ^ (18) 

which provides the analogous of Eq. (5) in Gnedin (2010a) for the new parametrization. 

As £ — > 00 the power-like decay of the masses in (fT7|) (cfr. Gnedin, 2010a, Sect. 3) is 
rewritten as 

F ^ = ~ ^ Wlth c = r(i- W i-7 + ^)r(7)- 
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To show that the weights in ([9]) actually arise by mixing the weights of the extreme Poisson- 
Dirichlet (— 1,£) partitions over £ with (|18p we apply Bayes theorem. The posterior distri- 
bution for 5 for the reparametrized model may be obtained by the general form for Gibbs 
models of the posterior of the number of new blocks arising in a new sample of dimension m 
(cfr. Cerquetti (2008, eq. (32) , see also Lijoi et al. 2007, eq. (4)) which expressed in terms 
of non-central generalized Stirling numbers is given by 

F(K* m = k*\K n = k) = Vn +™> k+k * S;X a '- {n ~ ak) - (19) 

Vn,k 

For a = — 1, inserting the specific weights in (j^J), and exploiting the definition of non-central 
Lah numbers S"^" = A3 yields 

P,,(A m _ . *) _ (™) (7+ „- fc)m _, (n+t+yW , |^M^2±|k , (20 ) 

and for m — > oo, by standard Stirling approximations, the posterior for the number of blocks 
of the two-parameter (7, ip) model results 



,, t( h = a*. = *) = '*- *>'-<* : r 7 ± *}- dn + 7 = kh ~* , 

T(£)(n + V)fc-<R£- 



which is still in the class of shifted univariate Waring distributions for parameters a = k — "f+t/j, 
1] = k — ip and p = n + 7 — k, for 7 < k < n + 7. Now, by Bayes theorem, 

therefore, exploiting (|2ip 

V>, 7 _ (1 ~ fk-lQ- - 7 + V')g-i(7)i-^ (g - l)fc-it-i r(g)(n + ^) fc _^ + g_i 



V 



n > k ' r(o(i+^)i^-i (e+i)n-i (fc-v)£-i(fc-7+y>)?-i(™+7-fc)fc-v> 

and with the substitution £ — = y, 

__ (1 - V>) y+fc _i(l - 7 + V')j/+fc-i(7)i-^ {y + k ~ l)fc-it-i r(?/ + fc)(n + V') fc -^ +i/+ fc-i 

T(y + fe)(l + V>)y+fc-i/> (y + A; + l) n _i (A; - (A; - 7 + ip) y+ k-i (n + j — k)k- 

Then, by the multiplicative property of rising factorials (x) a +b = (x) a (x + a)&, the last expres- 
sion easily simplifies to 

y V,7 = (7)n-fc(l ~ 7 + V0fc-l(l ~ 
n ' fc (l + #„_l(l+ 7 -V0n-l ' 

and the proof is complete. □ 
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We provide an additional result for the two-parameter (7, tp) Gnedin-Fisher model by exploit- 
ing the mixture representation introduced in Theorem 4. to obtain the structural distribution, 
the law of the frequency of Box 1, 



n— >oo n 



Propositions. The frequency P\ ofboxB\ of the (7, '0) Gnedin-Fisher model has distribution 

P 7 ,^(Pi G <fo) = 71-^(1 + V>) [<5i W + (1 - 7 + - 0)2^1(2 - V, 2 - 7 + 0, 2; 1 - y)] dy 

(22) 

/or 2^1(0, b, c; x) £/ie Gaussian hypergeometric function. 

Proof: By the mixture representation of Theorem 4. 

00 

P 7 ,^(A g ^) = £> 7iV ,(H = £)P(4i g dy). 

By the theory of the symmetric Dirichlet model, it is known that P<t j = Beta(2, £— i), therefore, 
since Pe(2,0) = 5i{dy) 

ib^A\ <\ wii m a i r C 1 -7 + ^)g-i(7)i-y>r(g + 1) *_ 2 
(Pi € dy) = ( 7 )i_^r(l+'0)5i(dy)+> rm/1 , M -7^7 — TV^ 1 -^ • 



By the change of variable £ — 2 = z 



(Pi G dy) = (7)^(1+^ (dy)+]C r(* + l)r(s + 2)(1 + tfrW, 



^ r(z + i)r(z + 2)(i + v) 2+2 ^ 

and by standard combinatorial calculus 



= ( 7 )i-^r(i + v) 

and the result follows. □ 



2Fl( 2,2- 7 ,2;l- !/ ) = j:-L^ (1 _ !/r , 



Remark 6. For i/j = (|22p yields the result in Gnedin (2010a, Sect. 6) for the distri- 
bution of the frequency Pi of box Pi for the one parameter (7) Gnedin-Fisher model. In 
fact 

(2-7), 
2=0 v A 

and multiplying and dividing by y 1-7 and exploiting the probability mass function of the 
Negative Binomial (2 — 7, y) yields 

E (2-7)2 /-, X2 -y-1 

^ r ( , + i)(2)/ (1 - y) =y7 • 

hence for y G (0, 1] 

P 7 (Pi G dy) = 70-1 (dy) + 7(1 - 7 )y 7_1 dy. 
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